Data compression systems and methods

ABSTRACT

Data compression using a combination of content independent data compression and content dependent data compression. In one aspect, a system for compressing data comprises: a processor, and a plurality of data compression encoders wherein at least one data encoder utilizes asymmetric data compression. The processor is configured to determine one or more parameters, attributes, or values of the data within at least a portion of a data block containing either video or audio data, to select one or more data compression encoders from the plurality of data compression encoders based upon the determined one or more parameters, attributes, or values of the data and a throughput of a communications channel, and to perform data compression with the selected one or more data compression encoders on at least the portion of the data block.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No.14/936,312, filed Nov. 9, 2015, which is a Continuation of U.S. patentapplication Ser. No. 14/727,309, filed Jun. 1, 2015, which is aContinuation of U.S. patent application Ser. No. 14/495,574, filed Sep.24, 2014, now U.S. Pat. No. 9,054,728, which is a Continuation of U.S.patent application Ser. No. 14/251,453, filed Apr. 11, 2014, now U.S.Pat. No. 8,933,825, which is a Continuation of U.S. patent applicationSer. No. 14/035,561, filed Sep. 24, 2013, now U.S. Pat. No. 8,717,203,which is a Continuation of U.S. patent application Ser. No. 13/154,211,now U.S. Pat. No. 8,643,513, filed Jun. 6, 2011, which is a Continuationof U.S. patent application Ser. No. 12/703,042, filed Feb. 9, 2010, nowU.S. Pat. No. 8,502,707, which is a Continuation of both U.S. patentapplication Ser. No. 11/651,366, filed Jan. 8, 2007, now abandoned, andU.S. patent application Ser. No. 11/651,365, filed Jan. 8, 2007, nowU.S. Pat. No. 7,714,747. Each of application Ser. No. 11/651,366 andapplication Ser. No. 11/651,365 is a Continuation of U.S. patentapplication Ser. No. 10/668,768, filed Sep. 22, 2003, now U.S. Pat. No.7,161,506, which is a Continuation of U.S. patent application Ser. No.10/016,355, filed Oct. 29, 2001, now U.S. Pat. No. 6,624,761, which is aContinuation-In-Part of U.S. patent application Ser. No. 09/705,446,filed Nov. 3, 2000, now U.S. Pat. No. 6,309,424, which is a Continuationof U.S. patent application Ser. No. 09/210,491, filed Dec. 11, 1998,which is now U.S. Pat. No. 6,195,024. Each of the listed applicationsare incorporated herein by reference in their entireties.

BACKGROUND

1. Technical Field

The present invention relates generally to a data compression anddecompression and, more particularly, to systems and methods for datacompression using content independent and content dependent datacompression and decompression.

2. Description of Related Art

Information may be represented in a variety of manners. Discreteinformation such as text and numbers are easily represented in digitaldata. This type of data representation is known as symbolic digitaldata. Symbolic digital data is thus an absolute representation of datasuch as a letter, figure, character, mark, machine code, or drawing.

Continuous information such as speech, music, audio, images and video,frequently exists in the natural world as analog information. As is wellknown to those skilled in the art, recent advances in very large scaleintegration (VLSI) digital computer technology have enabled bothdiscrete and analog information to be represented with digital data.Continuous information represented as digital data is often referred toas diffuse data. Diffuse digital data is thus a representation of datathat is of low information density and is typically not easilyrecognizable to humans in its native form.

There are many advantages associated with digital data representation.For instance, digital data is more readily processed, stored, andtransmitted due to its inherently high noise immunity. In addition, theinclusion of redundancy in digital data representation enables errordetection and/or correction. Error detection and/or correctioncapabilities are dependent upon the amount and type of data redundancy,available error detection and correction processing, and extent of datacorruption.

One outcome of digital data representation is the continuing need forincreased capacity in data processing, storage, and transmittal. This isespecially true for diffuse data where increases in fidelity andresolution create exponentially greater quantities of data. Datacompression is widely used to reduce the amount of data required toprocess, transmit, or store a given quantity of information. In general,there are two types of data compression techniques that may be utilizedeither separately or jointly to encode/decode data: lossless and lossydata compression.

Lossy data compression techniques provide for an inexact representationof the original uncompressed data such that the decoded (orreconstructed) data differs from the original unencoded/uncompresseddata. Lossy data compression is also known as irreversible or noisycompression. Entropy is defined as the quantity of information in agiven set of data. Thus, one obvious advantage of lossy data compressionis that the compression ratios can be larger than the entropy limit, allat the expense of information content. Many lossy data compressiontechniques seek to exploit various traits within the human senses toeliminate otherwise imperceptible data. For example, lossy datacompression of visual imagery might seek to delete information contentin excess of the display resolution or contrast ratio.

On the other hand, lossless data compression techniques provide an exactrepresentation of the original uncompressed data. Simply stated, thedecoded (or reconstructed) data is identical to the originalunencoded/uncompressed data. Lossless data compression is also known asreversible or noiseless compression. Thus, lossless data compressionhas, as its current limit, a minimum representation defined by theentropy of a given data set.

There are various problems associated with the use of losslesscompression techniques. One fundamental problem encountered with mostlossless data compression techniques are their content sensitivebehavior. This is often referred to as data dependency. Data dependencyimplies that the compression ratio achieved is highly contingent uponthe content of the data being compressed. For example, database filesoften have large unused fields and high data redundancies, offering theopportunity to losslessly compress data at ratios of 5 to 1 or more. Incontrast, concise software programs have little to no data redundancyand, typically, will not losslessly compress better than 2 to 1.

Another problem with lossless compression is that there are significantvariations in the compression ratio obtained when using a singlelossless data compression technique for data streams having differentdata content and data size. This process is known as natural variation.

A further problem is that negative compression may occur when certaindata compression techniques act upon many types of highly compresseddata. Highly compressed data appears random and many data compressiontechniques will substantially expand, not compress this type of data.

For a given application, there are many factors that govern theapplicability of various data compression techniques. These factorsinclude compression ratio, encoding and decoding processingrequirements, encoding and decoding time delays, compatibility withexisting standards, and implementation complexity and cost, along withthe is adaptability and robustness to variations in input data. A directrelationship exists in the current art between compression ratio and theamount and complexity of processing required. One of the limitingfactors in most existing prior art lossless data compression techniquesis the rate at which the encoding and decoding processes are performed.Hardware and software implementation tradeoffs are often dictated byencoder and decoder complexity along with cost.

Another problem associated with lossless compression methods isdetermining the optimal compression technique for a given set of inputdata and intended application. To combat this problem, there are manyconventional content dependent techniques that may be utilized. Forinstance, file type descriptors are typically appended to file names todescribe the application programs that normally act upon the datacontained within the file. In this manner data types, data structures,and formats within a given file may be ascertained. Fundamentallimitations with this content dependent technique include:

(1) the extremely large number of application programs, some of which donot possess published or documented file formats, data structures, ordata type descriptors;

(2) the ability for any data compression supplier or consortium toacquire, store, and access the vast amounts of data required to identifyknown file descriptors and associated data types, data structures, andformats; and

(3) the rate at which new application programs are developed and theneed to update file format data descriptions accordingly.

An alternative technique that approaches the problem of selecting anappropriate lossless data compression technique is disclosed, forexample, in U.S. Pat. No. 5,467,087 to Chu entitled “High Speed LosslessData Compression System” (“Chu”). FIG. 1 illustrates an embodiment ofthis data compression and decompression technique. Data compression 1comprises two phases, a data pre-compression phase 2 and a datacompression phase 3. Data decompression 4 of a compressed input datastream is also comprised of two phases, a data type retrieval phase 5and a data decompression phase 6. During the data compression process 1,the data pre-compressor 2 accepts an uncompressed data stream,identifies the data type of the input stream, and generates a data typeidentification signal. The data compressor 3 selects a data compressionmethod from a preselected set of methods to compress the input datastream, with the intention of producing the best available compressionratio for that particular data type.

There are several limitations associated with the Chu method. One suchlimitation is the need to unambiguously identify various data types.While these might include such common data types as ASCII, binary, orunicode, there, in fact, exists a broad universe of data types that falloutside the three most common data types. Examples of these alternatedata types include: signed and unsigned integers of various lengths,differing types and precision of floating point numbers, pointers, otherforms of character text, and a multitude of user defined data types.Additionally, data types may be interspersed or partially compressed,making data type recognition difficult and/or impractical. Anotherlimitation is that given a known data type, or mix of data types withina specific set or subset of input data, it may be difficult and/orimpractical to predict which data encoding technique yields the highestcompression ratio.

Accordingly, there is a need for a data compression system and methodthat would address limitations in conventional data compressiontechniques as described above.

SUMMARY OF THE INVENTION

The present invention is directed to systems and methods for providingfast and efficient data compression using a combination of contentindependent data compression and content dependent data compression. Inone aspect of the invention, a method for compressing data comprises thesteps of:

analyzing a data block of an input data stream to identify a data typeof the data block, the input data stream comprising a plurality ofdisparate data types;

performing content dependent data compression on the data block, if thedata type of the data block is identified;

performing content independent data compression on the data block, ifthe data type of the data block is not identified.

In another aspect, the step of performing content independent datacompression comprises: encoding the data block with a plurality ofencoders to provide a plurality of encoded data blocks; determining acompression ratio obtained for each of the encoders; comparing each ofthe determined compression ratios with a first compression threshold;selecting for output the input data block and appending a nullcompression descriptor to the input data block, if all of the encodercompression ratios do not meet the first compression threshold; andselecting for output the encoded data block having the highestcompression ratio and appending a corresponding compression typedescriptor to the selected encoded data block, if at least one of thecompression ratios meet the first compression threshold.

In another aspect, the step of performing content dependent compressioncomprises the steps of: selecting one or more encoders associated withthe identified data type and encoding the data block with the selectedencoders to provide a plurality of encoded data blocks; determining acompression ratio obtained for each of the selected encoders; comparingeach of the determined compression ratios with a second compressionthreshold; selecting for output the input data block and appending anull compression descriptor to the input data block, if all of theencoder compression do not meet the second compression threshold; andselecting for output the encoded data block having the highestcompression ratio and appending a corresponding compression typedescriptor to the selected encoded data block, if at least one of thecompression ratios meet the second compression threshold.

In yet another aspect, the step of performing content independent datacompression on the data block, if the data type of the data block is notidentified, comprises the steps of: estimating a desirability of usingof one or more encoder types based one characteristics of the datablock; and compressing the data block using one or more desirableencoders.

In another aspect, the step of performing content dependent datacompression on the data block, if the data type of the data block isidentified, comprises the steps of: estimating a desirability of usingof one or more encoder types based on characteristics of the data block;and compressing the data block using one or more desirable encoders.

In another aspect, the step of analyzing the data block comprisesanalyzing the data block to recognize one of a data type, datastructure, data block format, file substructure, and/or file types. Afurther step comprises maintaining an association between encoder typesand data types, data structures, data block formats, file substructure,and/or file types.

In yet another aspect of the invention, a method for compressing datacomprises the steps of:

analyzing a data block of an input data stream to identify a data typeof the data block, the input data stream comprising a plurality ofdisparate data types;

performing content dependent data compression on the data block, if thedata type of the data block is identified;

determining a compression ratio of the compressed data block obtainedusing the content dependent compression and comparing the compressionratio with a first compression threshold; and

performing content independent data compression on the data block, ifthe data type of the data block is not identified or if the compressionratio of the compressed data block obtained using the content dependentcompression does not meet the first compression threshold.

Advantageously, the present invention employs a plurality of encodersapplying a plurality of compression techniques on an input data streamso as to achieve maximum compression in accordance with the real-time orpseudo real-time data rate constraint. Thus, the output bit rate is notfixed and the amount, if any, of permissible data quality degradation isuser or data specified.

These and other aspects, features and advantages of the presentinvention will become apparent from the following detailed descriptionof preferred embodiments, which is to be read in connection with theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block/flow diagram of a content dependent high-speedlossless data compression and decompression system/method according tothe prior art;

FIG. 2 is a block diagram of a content independent data compressionsystem according to one embodiment of the present invention;

FIGS. 3a and 3b comprise a flow diagram of a data compression methodaccording to one aspect of the present invention, which illustrates theoperation of the data compression system of FIG. 2;

FIG. 4 is a block diagram of a content independent data compressionsystem according to another embodiment of the present invention havingan enhanced metric for selecting an optimal encoding technique;

FIGS. 5a and 5b comprise a flow diagram of a data compression methodaccording to another aspect of the present invention, which illustratesthe operation of the data compression system of FIG. 4;

FIG. 6 is a block diagram of a content independent data compressionsystem according to another embodiment of the present invention havingan a priori specified timer that provides real-time or pseudo real-timeof output data;

FIGS. 7a and 7b comprise a flow diagram of a data compression methodaccording to another aspect of the present invention, which illustratesthe operation of the data compression system of FIG. 6;

FIG. 8 is a block diagram of a content independent data compressionsystem according to another embodiment having an a priori specifiedtimer that provides real-time or pseudo real-time of output data and anenhanced metric for selecting an optimal encoding technique;

FIG. 9 is a block diagram of a content independent data compressionsystem according to another embodiment of the present invention havingan encoding architecture comprising a plurality of sets of seriallycascaded encoders;

FIGS. 10a and 10b comprise a flow diagram of a data compression methodaccording to another aspect of the present invention, which illustratesthe operation of the data compression system of FIG. 9;

FIG. 11 is block diagram of a content independent data decompressionsystem according to one embodiment of the present invention;

FIG. 12 is a flow diagram of a data decompression method according toone aspect of the present invention, which illustrates the operation ofthe data compression system of FIG. 11;

FIGS. 13a and 13b comprise a block diagram of a data compression systemcomprising content dependent and content independent data compression,according to an embodiment of the present invention;

FIGS. 14a-14d comprise a flow diagram of a data compression method usingboth content dependent and content independent data compression,according to one aspect of the present invention;

FIGS. 15a and 15b comprise a block diagram of a data compression systemcomprising content dependent and content independent data compression,according to another embodiment of the present invention;

FIGS. 16a-16d comprise a flow diagram of a data compression method usingboth content dependent and content independent data compression,according to another aspect of the present invention;

FIGS. 17a and 17b comprise a block diagram of a data compression systemcomprising content dependent and content independent data compression,according to another embodiment of the present invention; and

FIGS. 18a-18d comprise a flow diagram of a data compression method usingboth content dependent and content independent data compression,according to another aspect of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is directed to systems and methods for providingdata compression and decompression using content independent and contentdependent data compression and decompression. In the followingdescription, it is to be understood that system elements havingequivalent or similar functionality are designated with the samereference numerals in the Figures. It is to be further understood thatthe present invention may be implemented in various forms of hardware,software, firmware, or a combination thereof. In particular, the systemmodules described herein are preferably implemented in software as anapplication program that is executable by, e.g., a general purposecomputer or any machine or device having any suitable and preferredmicroprocessor architecture. Preferably, the present invention isimplemented on a computer platform including hardware such as one ormore central processing units (CPU), a random access memory (RAM), andinput/output (I/O) interface(s). The computer platform also includes anoperating system and microinstruction code. The various processes andfunctions described herein may be either part of the microinstructioncode or application programs which are executed via the operatingsystem. In addition, various other peripheral devices may be connectedto the computer platform such as an additional data storage device and aprinting device.

It is to be further understood that, because some of the constituentsystem components described herein are preferably implemented assoftware modules, the actual system connections shown in the Figures maydiffer depending upon the manner in which the systems are programmed. Itis to be appreciated that special purpose microprocessors may beemployed to implement the present invention. Given the teachings herein,one of ordinary skill in the related art will be able to contemplatethese and similar implementations or configurations of the presentinvention.

Referring now to FIG. 2 a block diagram illustrates a contentindependent data compression system according to one embodiment of thepresent invention. The data compression system includes a counter module10 that receives as input an uncompressed or compressed data stream. Itis to be understood that the system processes the input data stream indata blocks that may range in size from individual bits through completefiles or collections of multiple files. Additionally, the data blocksize may be fixed or variable. The counter module 10 counts the size ofeach input data block (i.e., the data block size is counted in bits,bytes, words, any convenient data multiple or metric, or any combinationthereof).

An input data buffer 20, operatively connected to the counter module 10,may be provided for buffering the input data stream in order to outputan uncompressed data stream in the event that, as discussed in furtherdetail below, every encoder fails to achieve a level of compression thatexceeds an a priori specified minimum compression ratio threshold. It isto be understood that the input data buffer 20 is not required forimplementing the present invention.

An encoder module 30 is operatively connected to the buffer 20 andcomprises a set of encoders E1, E2, E3 . . . En. The encoder set E1, E2,E3 . . . En may include any number “n” of those lossless encodingtechniques currently well known within the art such as run length,Huffman, Lempel-Ziv Dictionary Compression, arithmetic coding, datacompaction, and data null suppression. It is to be understood that theencoding techniques are selected based upon their ability to effectivelyencode different types of input data. It is to be appreciated that afull complement of encoders are preferably selected to provide a broadcoverage of existing and future data types.

The encoder module 30 successively receives as input each of thebuffered input data blocks (or unbuffered input data blocks from thecounter module 10). Data compression is performed by the encoder module30 wherein each of the encoders E1 . . . En processes a given input datablock and outputs a corresponding set of encoded data blocks. It is tobe appreciated that the system affords a user the option toenable/disable any one or more of the encoders E1 . . . En prior tooperation. As is understood by those skilled in the art, such featureallows the user to tailor the operation of the data compression systemfor specific applications. It is to be further appreciated that the isencoding process may be performed either in parallel or sequentially. Inparticular, the encoders E1 through En of encoder module 30 may operatein parallel (i.e., simultaneously processing a given input data block byutilizing task multiplexing on a single central processor, via dedicatedhardware, by executing on a plurality of processor or dedicated hardwaresystems, or any combination thereof). In addition, encoders E1 throughEn may operate sequentially on a given unbuffered or buffered input datablock. This process is intended to eliminate the complexity andadditional processing overhead associated with multiplexing concurrentencoding techniques on a single central processor and/or dedicatedhardware, set of central processors and/or dedicated hardware, or anyachievable combination. It is to be further appreciated that encoders ofthe identical type may be applied in parallel to enhance encoding speed.For instance, encoder E1 may comprise two parallel Huffman encoders forparallel processing of an input data block.

A buffer/counter module 40 is operatively connected to the encodingmodule 30 for buffering and counting the size of each of the encodeddata blocks output from encoder module 30. Specifically, thebuffer/counter 30 comprises a plurality of buffer/counters BC1, BC2, BC3. . . BCn, each operatively associated with a corresponding one of theencoders E1 . . . En. A compression ratio module 50, operativelyconnected to the output buffer/counter 40, determines the compressionratio obtained for each of the enabled encoders E1 . . . En by takingthe ratio of the size of the input data block to the size of the outputdata block stored in the corresponding buffer/counters BC1 . . . BCn. Inaddition, the compression ratio module 50 compares each compressionratio with an a priori-specified compression ratio threshold limit todetermine if at least one of the encoded data blocks output from theenabled encoders E1 . . . En achieves a compression that exceeds an apriori-specified threshold. As is understood by those skilled in theart, the threshold limit may be specified as any value inclusive of dataexpansion, no data compression or expansion, or any arbitrarily desiredcompression limit. A description module 60, operatively coupled to thecompression ratio module 50, appends a corresponding compression typedescriptor to each encoded data block which is selected for output so asto indicate the type of compression format of the encoded data block.

The operation of the data compression system of FIG. 2 will now bediscussed in is further detail with reference to the flow diagram ofFIGS. 3a and 3b . A data stream comprising one or more data blocks isinput into the data compression system and the first data block in thestream is received (step 300). As stated above, data compression isperformed on a per data block basis. Accordingly, the first input datablock in the input data stream is input into the counter module 10 thatcounts the size of the data block (step 302). The data block is thenstored in the buffer 20 (step 304). The data block is then sent to theencoder module 30 and compressed by each (enabled) encoder E1 . . . En(step 306). Upon completion of the encoding of the input data block, anencoded data block is output from each (enabled) encoder E1 . . . En andmaintained in a corresponding buffer (step 308), and the encoded datablock size is counted (step 310).

Next, a compression ratio is calculated for each encoded data block bytaking the ratio of the size of the input data block (as determined bythe input counter 10) to the size of each encoded data block output fromthe enabled encoders (step 312). Each compression ratio is then comparedwith an a priori-specified compression ratio threshold (step 314). It isto be understood that the threshold limit may be specified as any valueinclusive of data expansion, no data compression or expansion, or anyarbitrarily desired compression limit. It is to be further understoodthat notwithstanding that the current limit for lossless datacompression is the entropy limit (the present definition of informationcontent) for the data, the present invention does not preclude the useof future developments in lossless data compression that may increaselossless data compression ratios beyond what is currently known withinthe art.

After the compression ratios are compared with the threshold, adetermination is s made as to whether the compression ratio of at leastone of the encoded data blocks exceeds the threshold limit (step 316).If there are no encoded data blocks having a compression ratio thatexceeds the compression ratio threshold limit (negative determination instep 316), then the original unencoded input data block is selected foroutput and a null data compression type descriptor is appended thereto(step 318). A null data compression type descriptor is defined as anyrecognizable data token or descriptor that indicates no data encodinghas been applied to the input data block. Accordingly, the unencodedinput data block with its corresponding null data compression typedescriptor is then output for subsequent data processing, storage, ortransmittal (step 320).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 316), then the encoded data block having thegreatest compression ratio is selected (step 322). An appropriate datacompression type descriptor is then appended (step 324). A datacompression type descriptor is defined as any recognizable data token ordescriptor that indicates which data encoding technique has been appliedto the data. It is to be understood that, since encoders of theidentical type may be applied in parallel to enhance encoding speed (asdiscussed above), the data compression type descriptor identifies thecorresponding encoding technique applied to the encoded data block, notnecessarily the specific encoder. The encoded data block having thegreatest compression ratio along with its corresponding data compressiontype descriptor is then output for subsequent data processing, storage,or transmittal (step 326).

After the encoded data block or the unencoded data input data block isoutput (steps 326 and 320), a determination is made as to whether theinput data stream contains additional data blocks to be processed (step328). If the input data stream includes additional data blocks(affirmative result in step 328), the next successive data block isreceived (step 330), its block size is counted (return to step 302) andthe data compression process in repeated. This process is iterated foreach data block in the input data stream. Once the final input datablock is processed (negative result in step 328), data compression ofthe input data stream is finished (step 322).

Since a multitude of data types may be present within a given input datablock, it is often difficult and/or impractical to predict the level ofcompression that will be achieved by a specific encoder. Consequently,by processing the input data blocks with a plurality of encodingtechniques and comparing the compression results, content free datacompression is advantageously achieved. It is to be appreciated thatthis approach is scalable through future generations of processors,dedicated hardware, and software. As processing capacity increases andcosts reduce, the benefits provided by the present invention willcontinue to increase. It should again be noted that the presentinvention may employ any lossless data encoding technique.

Referring now to FIG. 4, a block diagram illustrates a contentindependent data compression system according to another embodiment ofthe present invention. The data compression system depicted in FIG. 4 issimilar to the data compression system of FIG. 2 except that theembodiment of FIG. 4 includes an enhanced metric functionality forselecting an optimal encoding technique. In particular, each of theencoders E1 . . . En in the encoder module 30 is tagged with acorresponding one of user-selected encoder desirability factors 70.Encoder desirability is defined as an a priori user specified factorthat takes into account any number of user considerations including, butnot limited to, compatibility of the encoded data with existingstandards, data error robustness, or any other aggregation of factorsthat the user wishes to consider for a particular application. Eachencoded data block output from the encoder module 30 has a correspondingdesirability factor appended thereto. A figure of merit module 80,operatively coupled to the compression ratio module 50 and thedescriptor module 60, is provided for calculating a figure of merit foreach of the encoded data blocks which possess a compression ratiogreater than the compression ratio threshold limit. The figure of meritfor each encoded data block is comprised of a weighted average of the apriori user specified threshold and the corresponding encoderdesirability factor. As discussed below in further detail with referenceto FIGS. 5a and 5b , the figure of merit substitutes the a priori usercompression threshold limit for selecting and outputting encoded datablocks.

The operation of the data compression system of FIG. 4 will now bediscussed in further detail with reference to the flow diagram of FIGS.5a and 5b . A data stream comprising one or more data blocks is inputinto the data compression system and the first data block in the streamis received (step 500). The size of the first data block is thendetermined by the counter module 10 (step 502). The data block is thenstored in the buffer 20 (step 504). The data block is then sent to theencoder module 30 and compressed by each (enabled) encoder in theencoder set E1 . . . En (step 506). Each encoded data block processed inthe encoder module 30 is tagged with an encoder desirability factor thatcorresponds the particular encoding technique applied to the encodeddata block (step 508). Upon completion of the encoding of the input datablock, an encoded data block with its corresponding desirability factoris output from each (enabled) encoder E1 . . . En and maintained in acorresponding buffer (step 510), and the encoded data block size iscounted (step 512).

Next, a compression ratio obtained by each enabled encoder is calculatedby taking the ratio of the size of the input data block (as determinedby the input counter 10) to the size of the encoded data block outputfrom each enabled encoder (step 514). Each compression ratio is thencompared with an a priori-specified compression ratio threshold (step516). A determination is made as to whether the compression ratio of atleast one of the encoded data blocks exceeds the threshold limit (step518). If there are no encoded data blocks having a compression ratiothat exceeds the compression ratio threshold limit (negativedetermination in step 518), then the original unencoded input data blockis selected for output and a null data compression type descriptor (asdiscussed above) is appended thereto (step 520). Accordingly, theoriginal unencoded input data block with its corresponding null datacompression type descriptor is then output for subsequent dataprocessing, storage, or transmittal (step 522).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 518), then a figure of merit is calculatedfor each encoded data block having a compression ratio which exceeds thecompression ratio threshold limit (step 524). Again, the figure of meritfor a given encoded data block is comprised of a weighted average of thea priori user specified threshold and the corresponding encoderdesirability factor associated with the encoded data block. Next, theencoded data block having the greatest figure of merit is selected foroutput (step 526). An appropriate data compression type descriptor isthen appended (step 528) to indicate the data encoding technique appliedto the encoded data block. The encoded data block (which has thegreatest figure of merit) along with its corresponding data compressiontype descriptor is then output for subsequent data processing, storage,or transmittal (step 530).

After the encoded data block or the unencoded input data block is output(steps 530 and 522), a determination is made as to whether the inputdata stream contains additional data blocks to be processed (step 532).If the input data stream includes additional data blocks (affirmativeresult in step 532), then the next successive data block is received(step 534), its block size is counted (return to step 502) and the datacompression process is iterated for each successive data block in theinput data stream. Once the final input data block is processed(negative result in step 532), data compression of the input data streamis finished (step 536).

Referring now to FIG. 6, a block diagram illustrates a data compressionsystem according to another embodiment of the present invention. Thedata compression system depicted in FIG. 6 is similar to the datacompression system discussed in detail above with reference to FIG. 2except that the embodiment of FIG. 6 includes an a priori specifiedtimer that provides real-time or pseudo real-time output data. Inparticular, an interval timer 90, operatively coupled to the encodermodule 30, is preloaded with a user specified time value. The role ofthe interval timer (as will be explained in greater detail below withreference to FIGS. 7a and 7b ) is to limit the processing time for eachinput data block processed by the encoder module 30 so as to ensure thatthe real-time, pseudo real-time, or other time critical nature of thedata compression processes is preserved.

The operation of the data compression system of FIG. 6 will now bediscussed in further detail with reference to the flow diagram of FIGS.7a and 7b . A data stream comprising one or more data blocks is inputinto the data compression system and the first data block in the datastream is received (step 700), and its size is determined by the countermodule 10 (step 702). The data block is then stored in buffer 20 (step704).

Next, concurrent with the completion of the receipt and counting of thefirst data block, the interval timer 90 is initialized (step 706) andstarts counting towards a user-specified time limit. The input datablock is then sent to the encoder module 30 wherein data compression ofthe data block by each (enabled) encoder E1 . . . En commences (step708). Next, a determination is made as to whether the user specifiedtime expires before the completion of the encoding process (steps 710and 712). If the encoding process is completed before or at theexpiration of the timer, i.e., each encoder (E1 through En) completesits respective encoding process (negative result in step 710 andaffirmative result in step 712), then an encoded data block is outputfrom each (enabled) encoder E1 . . . En and maintained in acorresponding buffer (step 714).

On the other hand, if the timer expires (affirmative result in 710), theencoding process is halted (step 716). Then, encoded data blocks fromonly those enabled encoders E1 . . . En that have completed the encodingprocess are selected and maintained in buffers (step 718). It is to beappreciated that it is not necessary (or in some cases desirable) thatsome or all of the encoders complete the encoding process before theinterval timer expires. Specifically, due to encoder data dependency andnatural variation, it is possible that certain encoders may not operatequickly enough and, therefore, do not comply with the timing constraintsof the end use. Accordingly, the time limit ensures that the real-timeor pseudo real-time nature of the data encoding is preserved.

After the encoded data blocks are buffered (step 714 or 718), the sizeof each encoded data block is counted (step 720). Next, a compressionratio is calculated for each encoded data block by taking the ratio ofthe size of the input data block (as determined by the input counter 10)to the size of the encoded data block output from each enabled encoder(step 722). Each compression ratio is then compared with an apriori-specified compression ratio threshold (step 724). A determinationis made as to whether the compression ratio of at least one of theencoded data blocks exceeds the threshold limit (step 726). If there areno encoded data blocks having a compression ratio that exceeds thecompression ratio threshold limit (negative determination in step 726),then the original unencoded input data block is selected for output anda null data compression type descriptor is appended thereto (step 728).The original unencoded input data block with its corresponding null datacompression type descriptor is then output for subsequent dataprocessing, storage, or transmittal (step 730).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 726), then the encoded data block having thegreatest compression ratio is selected (step 732). An appropriate datacompression type descriptor is then appended (step 734). The encodeddata block having the greatest compression ratio along with itscorresponding data compression type descriptor is then output forsubsequent data processing, storage, or transmittal (step 736).

After the encoded data block or the unencoded input data block is output(steps 730 or 736), a determination is made as to whether the input datastream contains additional data blocks to be processed (step 738). Ifthe input data stream includes additional data blocks (affirmativeresult in step 738), the next successive data block is received (step740), its block size is counted (return to step 702) and the datacompression process in repeated. This process is iterated for each datablock in the input data stream, with each data block being processedwithin the user-specified time limit as discussed above. Once the finalinput data block is processed (negative result in step 738), datacompression of the input data stream is complete (step 742).

Referring now to FIG. 8, a block diagram illustrates a contentindependent data compression system according to another embodiment ofthe present system. The data compression system of FIG. 8 incorporatesall of the features discussed above in connection with the systemembodiments of FIGS. 2, 4, and 6. For example, the system of FIG. 8incorporates both the a priori specified timer for providing real-timeor pseudo real-time of output data, as well as the enhanced metric forselecting an optimal encoding technique. Based on the foregoingdiscussion, the operation of the system of FIG. 8 is understood by thoseskilled in the art.

Referring now to FIG. 9, a block diagram illustrates a data compressionsystem according to a preferred embodiment of the present invention. Thesystem of FIG. 9 contains many of the features of the previousembodiments discussed above. However, this embodiment advantageouslyincludes a cascaded encoder module 30 c having an encoding architecturecomprising a plurality of sets of serially cascaded encoders Em,n, where“m” refers to the encoding path (i.e., the encoder set) and where “n”refers to the number of encoders in the respective path. It is to beunderstood that each set of serially cascaded encoders can include anynumber of disparate and/or similar encoders (i.e., n can be any valuefor a given path m).

The system of FIG. 9 also includes a output buffer module 40 c whichcomprises a plurality of buffer/counters B/Cm,n, each associated with acorresponding one of the encoders Em,n. In this embodiment, an inputdata block is sequentially applied to successive encoders (encoderstages) in the encoder path so as to increase the data compressionratio. For example, the output data block from a first encoder E1,1, isbuffered and counted in B/C1,1, for subsequent processing by a secondencoder E1,2. Advantageously, these parallel sets of sequential encodersare applied to the input data stream to effect content free losslessdata compression. This embodiment provides for multi-stage sequentialencoding of data with the maximum number of encoding steps subject tothe available real-time, pseudo real-time, or other timing constraints.

As with each previously discussed embodiment, the encoders Em,n mayinclude those lossless encoding techniques currently well known withinthe art, including: run length, Huffman, Lempel-Ziv DictionaryCompression, arithmetic coding, data compaction, and data nullsuppression. Encoding techniques are selected based upon their abilityto effectively encode different types of input data. A full complementof encoders provides for broad coverage of existing and future datatypes. The input data blocks may be applied simultaneously to theencoder paths (i.e., the encoder paths may operate in parallel,utilizing task multiplexing on a single central processor, or viadedicated hardware, or by executing on a plurality of processor ordedicated hardware systems, or any combination thereof). In addition, aninput data block may be sequentially applied to the encoder paths.Moreover, each serially cascaded encoder path may comprise a fixed(predetermined) sequence of encoders or a random sequence of encoders.Advantageously, by simultaneously or sequentially processing input datablocks via a plurality of sets of serially cascaded encoders, contentfree data compression is achieved.

The operation of the data compression system of FIG. 9 will now bediscussed in further detail with reference to the flow diagram of FIGS.10a and 10b . A data stream comprising one or more data blocks is inputinto the data compression system and the first data block in the datastream is received (step 100), and its size is determined by the countermodule 10 (step 102). The data block is then stored in buffer 20 (step104).

Next, concurrent with the completion of the receipt and counting of thefirst data block, the interval timer 90 is initialized (step 106) andstarts counting towards a user-specified time limit. The input datablock is then sent to the cascade encoder module 30C wherein the inputdata block is applied to the first encoder (i.e., first encoding stage)in each of the cascaded encoder paths E1,1 . . . Em,1 (step 108). Next,a determination is made as to whether the user specified time expiresbefore the completion of the first stage encoding process (steps 110 and112). If the first stage encoding process is completed before theexpiration of the timer, i.e., each encoder (E1,1 . . . Em,1) completesits respective encoding process (negative result in step 110 andaffirmative result in step 112), then an encoded data block is outputfrom each encoder E1,1 . . . Em,1 and maintained in a correspondingbuffer (step 114). Then for each cascade encoder path, the output of thecompleted encoding stage is applied to the next successive encodingstage in the cascade path (step 116). This process (steps 110, 112, 114,and 116) is repeated until the earlier of the timer expiration(affirmative result in step 110) or the completion of encoding by eachencoder stage in the serially cascaded paths, at which time the encodingprocess is halted (step 118).

Then, for each cascade encoder path, the buffered encoded data blockoutput by the last encoder stage that completes the encoding processbefore the expiration of the timer is selected for further processing(step 120). Advantageously, the interim stages of the multi-stage dataencoding process are preserved. For example, the results of encoder E1,1are preserved even after encoder E1,2 begins encoding the output ofencoder E1,1. If the interval timer expires after encoder E1,1 completesits respective encoding process but before encoder E1,2 completes itsrespective encoding process, the encoded data block from encoder E1,1 iscomplete and is utilized for calculating the compression ratio for thecorresponding encoder path. The incomplete encoded data block fromencoder E1,2 is either discarded or ignored.

It is to be appreciated that it is not necessary (or in some casesdesirable) that some or all of the encoders in the cascade encoder pathscomplete the encoding process before the interval timer expires.Specifically, due to encoder data dependency, natural variation and thesequential application of the cascaded encoders, it is possible thatcertain encoders may not operate quickly enough and therefore do notcomply with the timing constraints of the end use. Accordingly, the timelimit ensures that the real-time or pseudo real-time nature of the dataencoding is preserved.

After the encoded data blocks are selected (step 120), the size of eachencoded data block is counted (step 122). Next, a compression ratio iscalculated for each encoded data block by taking the ratio of the sizeof the input data block (as determined by the input counter 10) to thesize of the encoded data block output from each encoder (step 124). Eachcompression ratio is then compared with an a priori-specifiedcompression ratio threshold (step 126). A determination is made as towhether the compression ratio of at least one of the encoded data blocksexceeds the threshold limit (step 128). If there are no encoded datablocks having a compression ratio that exceeds the compression ratiothreshold limit (negative determination in step 128), then the originalunencoded input data block is selected for output and a null datacompression type descriptor is appended thereto (step 130). The originalunencoded data block and its corresponding null data compression typedescriptor is then output for subsequent data processing, storage, ortransmittal (step 132).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 128), then a figure of merit is calculatedfor each encoded data block having a compression ratio which exceeds thecompression ratio threshold limit (step 134). Again, the figure of meritfor a given encoded data block is comprised of a weighted average of thea priori user specified threshold and the corresponding encoderdesirability factor associated with the encoded data block. Next, theencoded data block having the greatest figure of merit is selected (step136). An appropriate data compression type descriptor is then appended(step 138) to indicate the data encoding technique applied to theencoded data block. For instance, the data type compression descriptorcan indicate that the encoded data block was processed by either asingle encoding type, a plurality of sequential encoding types, and aplurality of random encoding types. The encoded data block (which hasthe greatest figure of merit) along with its corresponding datacompression type descriptor is then output for subsequent dataprocessing, storage, or transmittal (step 140).

After the unencoded data block or the encoded data input data block isoutput (steps 132 and 140), a determination is made as to whether theinput data stream contains additional data blocks to be processed (step142). If the input data stream includes additional data blocks(affirmative result in step 142), then the next successive data block isreceived (step 144), its block size is counted (return to step 102) andthe data compression process is iterated for each successive data blockin the input data stream. Once the final input data block is processed(negative result in step 142), data compression of the input data streamis finished (step 146).

Referring now to FIG. 11, a block diagram illustrates a datadecompression system according to one embodiment of the presentinvention. The data decompression system preferably includes an inputbuffer 1100 that receives as input an uncompressed or compressed datastream comprising one or more data blocks. The data blocks may range insize from individual bits through complete files or collections ofmultiple files. Additionally, the data block size may be fixed orvariable. The input data buffer 1100 is preferably included (notrequired) to provide storage of input data for various hardwareimplementations. A descriptor extraction module 1102 receives thebuffered (or unbuffered) input data block and then parses, lexically,syntactically, or otherwise analyzes the input data block using methodsknown by those skilled in the art to extract the data compression typedescriptor associated with the data block. The data compression typedescriptor may possess values corresponding to null (no encodingapplied), a single applied encoding technique, or multiple encodingtechniques applied in a specific or random order (in accordance with thedata compression system embodiments and methods discussed above).

A decoder module 1104 includes a plurality of decoders D1 . . . Dn fordecoding the input data block using a decoder, set of decoders, or asequential set of decoders corresponding to the extracted compressiontype descriptor. The decoders D1 . . . Dn may include those losslessencoding techniques currently well known within the art, including: runlength, Huffman, Lempel-Ziv Dictionary Compression, arithmetic coding,data compaction, and data null suppression. Decoding techniques areselected based upon their ability to effectively decode the variousdifferent types of encoded input data generated by the data compressionsystems described above or originating from any other desired source. Aswith the data compression systems discussed above, the decoder module1104 may include multiple decoders of the same type applied in parallelso as to reduce the data decoding time.

The data decompression system also includes an output data buffer 1106for buffering the decoded data block output from the decoder module1104.

The operation of the data decompression system of FIG. 11 will bediscussed in further detail with reference to the flow diagram of FIG.12. A data stream comprising one or more data blocks of compressed oruncompressed data is input into the data decompression system and thefirst data block in the stream is received (step 1200) and maintained inthe buffer (step 1202). As with the data compression systems discussedabove, data decompression is performed on a per data block basis. Thedata compression type descriptor is then extracted from the input datablock (step 1204). A determination is then made as to whether the datacompression type descriptor is null (step 1206). If the data compressiontype descriptor is determined to be null (affirmative result in step1206), then no decoding is applied to the input data block and theoriginal undecoded data block is output (or maintained in the outputbuffer) (step 1208).

On the other hand, if the data compression type descriptor is determinedto be any value other than null (negative result in step 1206), thecorresponding decoder or decoders are then selected (step 1210) from theavailable set of decoders D1 . . . Dn in the decoding module 1104. It isto be understood that the data compression type descriptor may mandatethe application of: a single specific decoder, an ordered sequence ofspecific decoders, a random order of specific decoders, a class orfamily of decoders, a mandatory or optional application of paralleldecoders, or any combination or permutation thereof. The input datablock is then decoded using the selected decoders (step 1212), andoutput (or maintained in the output buffer 1106) for subsequent dataprocessing, storage, or transmittal (step 1214). A determination is thenmade as to whether the input data stream contains additional data blocksto be processed (step 1216). If the input data stream includesadditional data blocks (affirmative result in step 1216), the nextsuccessive data block is received (step 1220), and buffered (return tostep 1202). Thereafter, the data decompression process is iterated foreach data block in the input data stream. Once the final input datablock is processed (negative result in step 1216), data decompression ofthe input data stream is finished (step 1218).

In other embodiments of the present invention described below, datacompression is achieved using a combination of content dependent datacompression and content independent data compression. For example, FIGS.13a and 13b are block diagrams illustrating a data compression systememploying both content independent and content dependent datacompression according to one embodiment of the present invention,wherein content independent data compression is applied to a data blockwhen the content of the data block cannot be identified or is notassociable with a specific data compression algorithm. The datacompression system comprises a counter module 10 that receives as inputan uncompressed or compressed data stream. It is to be understood thatthe system processes the input data stream in data blocks that may rangein size from individual bits through complete files or collections ofmultiple files. Additionally, the data block size may be fixed orvariable. The counter module 10 counts the size of each input data block(i.e., the data block size is counted in bits, bytes, words, anyconvenient data multiple or metric, or any combination thereof).

An input data buffer 20, operatively connected to the counter module 10,may be provided for buffering the input data stream in order to outputan uncompressed data stream in the event that, as discussed in furtherdetail below, every encoder fails to achieve a level of compression thatexceeds a priori specified content independent or content dependentminimum compression ratio thresholds. It is to be understood that theinput data buffer 20 is not required for implementing the presentinvention.

A content dependent data recognition module 1300 analyzes the incomingdata stream to recognize data types, data structures, data blockformats, file substructures, file types, and/or any other parametersthat may be indicative of either the data type/content of a given datablock or the appropriate data compression algorithm or algorithms (inserial or in parallel) to be applied. Optionally, a data filerecognition list(s) or algorithm(s) 1310 module may be employed to holdand/or determine associations between recognized data parameters andappropriate algorithms. Each data block that is recognized by thecontent data compression module 1300 is routed to a content dependentencoder module 1320, if not the data is routed to the contentindependent encoder module 30.

A content dependent encoder module 1320 is operatively connected to thecontent dependent data recognition module 1300 and comprises a set ofencoders D1, D2, D3 . . . Dm. The encoder set D1, D2, D3 . . . Dm mayinclude any number “n” of those lossless or lossy encoding techniquescurrently well known within the art such as MPEG4, various voice codecs,MPEG3, AC3, AAC, as well as lossless algorithms such as run length,Huffinan, Lempel-Ziv Dictionary Compression, arithmetic coding, datacompaction, and data null suppression. It is to be understood that theencoding techniques are selected based upon their ability to effectivelyencode different types of input data. It is to be appreciated that afull complement of encoders and or codecs are preferably selected toprovide a broad coverage of existing and future data types.

The content independent encoder module 30, which is operativelyconnected to the content dependent data recognition module 1300,comprises a set of encoders E1, E2, E3 . . . En. The encoder set E1, E2,E3 . . . En may include any number “n” of those lossless encodingtechniques currently well known within the art such as run length,Huffman, Lempel-Ziv Dictionary Compression, arithmetic coding, datacompaction, and data null suppression. Again, it is to be understoodthat the encoding techniques are selected based upon their ability toeffectively encode different types of input data. It is to beappreciated that a full complement of encoders are preferably selectedto provide a broad coverage of existing and future data types.

The encoder modules (content dependent 1320 and content independent 30)selectively receive the buffered input data blocks (or unbuffered inputdata blocks from the counter module 10) from module 1300 based on theresults of recognition. Data compression is performed by the respectiveencoder modules wherein some or all of the encoders D1 . . . Dm or E1 .. . En processes a given input data block and outputs a correspondingset of encoded data blocks. It is to be appreciated that the systemaffords a user the option to enable/disable any one or more of theencoders D1 . . . Dm and E1 . . . En prior to operation. As isunderstood by those skilled in the art, such feature allows the user totailor the operation of the data compression system for specificapplications. It is to be further appreciated that the encoding processmay be performed either in parallel or sequentially. In particular, theencoder set D1 through Dm of encoder module 1320 and/or the encoder setE1 through En of encoder module 30 may operate in parallel (i.e.,simultaneously processing a given input data block by utilizing taskmultiplexing on a single central processor, via dedicated hardware, byexecuting on a plurality of processor or dedicated hardware systems, orany combination thereof). In addition, encoders D1 through Dm and E1through En may operate sequentially on a given unbuffered or bufferedinput data block. This process is intended to eliminate the complexityand additional processing overhead associated with multiplexingconcurrent encoding techniques on a single central processor and/ordedicated hardware, set of central processors and/or dedicated hardware,or any achievable combination. It is to be further appreciated thatencoders of the identical type may be applied in parallel to enhanceencoding speed. For instance, encoder E1 may comprise two parallelHuffman encoders for parallel processing of an input data block. Itshould be further noted that one or more algorithms may be implementedin dedicated hardware such as an MPEG4 or MP3 encoding integratedcircuit.

Buffer/counter modules 1330 and 40 are operatively connected to theirrespective encoding modules 1320 and 30, for buffering and counting thesize of each of the encoded data blocks output from the respectiveencoder modules. Specifically, the content dependent buffer/counter 1330comprises a plurality of buffer/counters BCD1, BCD2, BCD3 . . . BCDm,each operatively associated with a corresponding one of the encoders D1. . . Dm. Similarly the content independent buffer/counters BCE1, BCE2,BCE3 . . . BCEn, each operatively associated with a corresponding one ofthe encoders E1 . . . En. A compression ratio module 1340, operativelyconnected to the content dependent output buffer/counters 1330 andcontent independent buffer/counters 40 determines the compression ratioobtained for each of the enabled encoders D1 . . . Dm and or E1 . . . Enby taking the ratio of the size of the input data block to the size ofthe output data block stored in the corresponding buffer/counters BCD1,BCD2, BCD3 . . . BCDm and or BCE1, BCE2, BCE3 . . . BCEn. In addition,the compression ratio module 1340 compares each compression ratio withan a priori-specified compression ratio threshold limit to determine ifat least one of the encoded data blocks output from the enabled encodersBCD1, BCD2, BCD3 . . . BCDm and or BCE1, BCE2, BCE3 . . . BCEn achievesa compression that meets an a priori-specified threshold. As is.understood by those skilled in the art, the threshold limit maybespecified as any value inclusive of data expansion, no data compressionor expansion, or any arbitrarily desired compression limit. It should benoted that different threshold values may be applied to contentdependent and content independent encoded data. Further these thresholdsmay be adaptively modified based upon enabled encoders in either or boththe content dependent or content independent encoder sets, along withany associated parameters. A compression type description module 1350,operatively coupled to the compression ratio module 1340, appends acorresponding compression type descriptor to each encoded data blockwhich is selected for output so as to indicate the type of compressionformat of the encoded data block.

A mode of operation of the data compression system of FIGS. 13a and 13bwill now be discussed with reference to the flow diagrams of FIGS.14a-14d , which illustrates a method for performing data compressionusing a combination of content dependent and content independent datacompression. In general, content independent data compression is appliedto a given data block when the content of a data block cannot beidentified or is not associated with a specific data compressionalgorithm. More specifically, referring to FIG. 14a , a data streamcomprising one or more data blocks is input into the data compressionsystem and the first data block in the stream is received (step 1400).As stated above, data compression is performed on a per data blockbasis. As previously stated a data block may represent any quantity ofdata from a single bit through a multiplicity of files or packets andmay vary from block to block. Accordingly, the first input data block inthe input data stream is input into the counter module 10 that countsthe size of the data block (step 1402). The data block is then stored inthe buffer 20 (step 1404). The data block is then analyzed on a perblock or multi-block basis by the content dependent data recognitionmodule 1300 (step 1406). If the data stream content is not recognizedutilizing the recognition list(s) or algorithms(s) module 1310 (step1408) the data is routed to the content independent encoder module 30and compressed by each (enabled) encoder E1 . . . En (step 1410). Uponcompletion of the encoding of the input data block, an encoded datablock is output from each (enabled) encoder E1 . . . En and maintainedin a corresponding buffer (step 1412), and the encoded data block sizeis counted (step 1414).

Next, a compression ratio is calculated for each encoded data block bytaking the ratio of the size of the input data block (as determined bythe input counter 10 to the size of each encoded data block output fromthe enabled encoders (step 1416). Each compression ratio is thencompared with an apriori-specified compression ratio threshold (step1418). It is to be understood that the threshold limit may be specifiedas any value inclusive of data expansion, no data compression orexpansion, or any arbitrarily desired compression limit. It is to befurther understood that notwithstanding that the current limit forlossless data compression is the entropy limit (the present definitionof information content) for the data, the present invention does notpreclude the use of future developments in lossless data compressionthat may increase lossless data compression ratios beyond what iscurrently known within the art. Additionally the content independentdata compression threshold may be different from the content dependentthreshold and either may be modified by the specific enabled encoders.

After the compression ratios are compared with the threshold, adetermination is made as to whether the compression ratio of at leastone of the encoded data blocks exceeds the threshold limit (step 1420).If there are no encoded data blocks having a compression ratio thatexceeds the compression ratio threshold limit (negative determination instep 1420), then the original unencoded input data block is selected foroutput and a null data compression type descriptor is appended thereto(step 1434). A null data compression type descriptor is defined as anyrecognizable data token or descriptor that indicates no data encodinghas been applied to the input data block. Accordingly, the unencodedinput data block with its corresponding null data compression typedescriptor is then output for subsequent data processing, storage, ortransmittal (step 1436).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 1420), then the encoded data block havingthe greatest compression ratio is selected (step 1422). An appropriatedata compression type descriptor is then appended (step 1424). A datacompression type descriptor is defined as any recognizable data token ordescriptor that indicates which data encoding technique has been appliedto the data. It is to be understood that, since encoders of theidentical type may be applied in parallel to enhance encoding speed (asdiscussed above), the data compression type descriptor identifies thecorresponding encoding technique applied to the encoded data block, notnecessarily the specific encoder. The encoded data block having thegreatest compression ratio along with its corresponding data compressiontype descriptor is then output for subsequent data processing, storage,or transmittal (step 1426).

As previously stated the data block stored in the buffer 20 (step 1404)is analyzed on a per block or multi-block basis by the content dependentdata recognition module 1300 (step 1406). If the data stream content isrecognized utilizing the recognition list(s) or algorithms(s) module1310 (step 1434) the appropriate content dependent algorithms areenabled and initialized (step 1436), and the data is routed to thecontent dependent encoder module 1320 and compressed by each (enabled)encoder D1 . . . Dm (step 1438). Upon completion of the encoding of theinput data block, an encoded data block is output from each (enabled)encoder D1 . . . Dm and maintained in a corresponding buffer (step1440), and the encoded data block size is counted (step 1442).

Next, a compression ratio is calculated for each encoded data block bytaking the ratio of the size of the input data block (as determined bythe input counter 10 to the size of each encoded data block output fromthe enabled encoders (step 1444). Each compression ratio is thencompared with an a priori-specified compression ratio threshold (step1448). It is to be understood that the threshold limit may be specifiedas any value inclusive of data expansion, no data compression orexpansion, or any arbitrarily desired compression limit. It is to befurther understood that many of these algorithms may be lossy, and assuch the limits may be subject to or modified by an end target storage,listening, or viewing device. Further notwithstanding that the currentlimit for lossless data compression is the entropy limit (the presentdefinition of information content) for the data, the present inventiondoes not preclude the use of future developments in lossless datacompression that may increase lossless data compression ratios beyondwhat is currently known within the art. Additionally the contentindependent data compression threshold may be different from the contentdependent threshold and either may be modified by the specific enabledencoders.

After the compression ratios are compared with the threshold, adetermination is made as to whether the compression ratio of at leastone of the encoded data blocks exceeds the threshold limit (step 1420).If there are no encoded data blocks having a compression ratio thatexceeds the compression ratio threshold limit (negative determination instep 1420), then the original unencoded input data block is selected foroutput and a null data compression type descriptor is appended thereto(step 1434). A null data compression type descriptor is defined as anyrecognizable data token or descriptor that indicates no data encodinghas been applied to the input data block. Accordingly, the unencodedinput data block with its corresponding null data compression typedescriptor is then output for subsequent data processing, storage, ortransmittal (step 1436).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 1420), then the encoded data block havingthe greatest compression ratio is selected (step 1422). An appropriatedata compression type descriptor is then appended (step 1424). A datacompression type descriptor is defined as any recognizable data token ordescriptor that indicates which data encoding technique has been appliedto the data. It is to be understood that, since encoders of theidentical type may be applied in parallel to enhance encoding speed (asdiscussed above), the data compression type descriptor identifies thecorresponding encoding technique applied to the encoded data block, notnecessarily the specific encoder. The encoded data block having thegreatest compression ratio along with its corresponding data compressiontype descriptor is then output for subsequent data processing, storage,or transmittal (step 1426).

After the encoded data block or the unencoded data input data block isoutput (steps 1426 and 1436), a determination is made as to whether theinput data stream contains additional data blocks to be processed (step1428). If the input data stream includes additional data blocks(affirmative result in step 1428), the next successive data block isreceived (step 1432), its block size is counted (return to step 1402)and the data compression process in repeated. This process is iteratedfor each data block in the input data stream. Once the final input datablock is processed (negative result in step 1428), data compression ofthe input data stream is finished (step 1430).

Since a multitude of data types may be present within a given input datablock, it is often difficult and/or impractical to predict the level ofcompression that will be achieved by a specific encoder. Consequently,by processing the input data blocks with a plurality of encodingtechniques and comparing the compression results, content free datacompression is advantageously achieved. Further the encoding may belossy or lossless dependent upon the input data types. Further if thedata type is not recognized the default content independent losslesscompression is applied. It is not a requirement that this process bedeterministic—in fact a certain probability may be applied if occasionaldata loss is permitted. It is to be appreciated that this approach isscalable through future generations of processors, dedicated hardware,and software. As processing capacity increases and costs reduce, thebenefits provided by the present invention will continue to increase. Itshould again be noted that the present invention may employ any losslessdata encoding technique.

FIGS. 15a and 15b are block diagrams illustrating a data compressionsystem employing both content independent and content dependent datacompression according to another embodiment of the present invention.The system in FIGS. 15a and 15b is similar in operation to the system ofFIGS. 13a and 13b in that content independent data compression isapplied to a data block when the content of the data block cannot beidentified or is not associable with a specific data compressionalgorithm. The system of FIGS. 15a and 15b additionally performs contentindependent data compression on a data block when the compression ratioobtained for the data block using the content dependent data compressiondoes not meet a specified threshold.

A mode of operation of the data compression system of FIGS. 15a and 15bwill now be discussed with reference to the flow diagram of FIGS.16a-16d , which illustrates a method for performing data compressionusing a combination of content dependent and content independent datacompression. A data stream comprising one or more data blocks is inputinto the data compression system and the first data block in the streamis received (step 1600). As stated above, data compression is performedon a per data block basis. As previously stated a data block mayrepresent any quantity of data from a single bit through a multiplicityof files or packets and may vary from block to block. Accordingly, thefirst input data block in the input data stream is input into thecounter module 10 that counts the size of the data block (step 1602).The data block is then stored in the buffer 20 (step 1604). The datablock is then analyzed on a per block or multi-block basis by thecontent dependent data recognition module 1300 (step 1606). If the datastream content is not recognized utilizing the recognition list(s) oralgorithms(s) module 1310 (step 1608) the data is routed to the contentindependent encoder module 30 and compressed by each (enabled) encoderE1 . . . En (step 1610). Upon completion of the encoding of the inputdata block, an encoded data block is output from each (enabled) encoderE1 . . . En and maintained in a corresponding buffer (step 1612), andthe encoded data block size is counted (step 1614).

Next, a compression ratio is calculated for each encoded data block bytaking the ratio of the size of the input data block (as determined bythe input counter 10 to the size of each encoded data block output fromthe enabled encoders (step 1616). Each compression ratio is thencompared with an a priori-specified compression ratio threshold (step1618). It is to be understood that the threshold limit may be specifiedas any value inclusive of data expansion, no data compression orexpansion, or any arbitrarily desired compression limit. It is to befurther understood that notwithstanding that the current limit forlossless data compression is the entropy limit (the present definitionof information content) for the data, the present invention does notpreclude the use of future developments in lossless data compressionthat may increase lossless data compression ratios beyond what iscurrently known within the art. Additionally the content independentdata compression threshold may be different from the content dependentthreshold and either may be modified by the specific enabled encoders.

After the compression ratios are compared with the threshold, adetermination is made as to whether the compression ratio of at leastone of the encoded data blocks exceeds the threshold limit (step 1620).If there are no encoded data blocks having a compression ratio thatexceeds the compression ratio threshold limit (negative determination instep 1620), then the original unencoded input data block is selected foroutput and a null data compression type descriptor is appended thereto(step 1634). A null data compression type descriptor is defined as anyrecognizable data token or descriptor that indicates no data encodinghas been applied to the input data block. Accordingly, the unencodedinput data block with its corresponding null data compression typedescriptor is then output for subsequent data processing, storage, ortransmittal (step 1636).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 1620), then the encoded data block havingthe greatest compression ratio is selected (step 1622). An appropriatedata compression type descriptor is then appended (step 1624). A datacompression type descriptor is defined as any recognizable data token ordescriptor that indicates which data encoding technique has been appliedto the data. It is to be understood that, since encoders of theidentical type may be applied in parallel to enhance encoding speed (asdiscussed above), the data compression type descriptor identifies thecorresponding encoding technique applied to the encoded data block, notnecessarily the specific encoder. The encoded data block having thegreatest compression ratio along with its corresponding data compressiontype descriptor is then output for subsequent data processing, storage,or transmittal (step 1626).

As previously stated the data block stored in the buffer 20 (step 1604)is analyzed on a per block or multi-block basis by the content dependentdata recognition module 1300 (step 1606). If the data stream content isrecognized utilizing the recognition list(s) or algorithms(s) module1310 (step 1634) the appropriate content dependent algorithms areenabled and initialized (step 1636) and the data is routed to thecontent dependent encoder module 1620 and compressed by each (enabled)encoder D1 . . . Dm (step 1638). Upon completion of the encoding of theinput data block, an encoded data block is output from each (enabled)encoder D1 . . . Dm and maintained in a corresponding buffer (step1640), and the encoded data block size is counted (step 1642).

Next, a compression ratio is calculated for each encoded data block bytaking the ratio of the size of the input data block (as determined bythe input counter 10 to the size of each encoded data block output fromthe enabled encoders (step 1644). Each compression ratio is thencompared with an a priori-specified compression ratio threshold (step1648). It is to be understood that the threshold limit may be specifiedas any value inclusive of data expansion, no data compression orexpansion, or any arbitrarily desired compression limit. It is to befurther understood that many of these algorithms may be lossy, and assuch the limits may be subject to or modified by an end target storage,listening, or viewing device. Further notwithstanding that the currentlimit for lossless data compression is the entropy limit (the presentdefinition of information content) for the data, the present inventiondoes not preclude the use of future developments in lossless datacompression that may increase lossless data compression ratios beyondwhat is currently known within the art. Additionally the contentindependent data compression threshold may be different from the contentdependent threshold and either may be modified by the specific enabledencoders.

After the compression ratios are compared with the threshold, adetermination is made as to whether the compression ratio of at leastone of the encoded data blocks exceeds the threshold limit (step 1648).If there are no encoded data blocks having a compression ratio thatexceeds the compression ratio threshold limit (negative determination instep 1620), then the original unencoded input data block is routed tothe content independent encoder module 30 and the process resumes withcompression utilizing content independent encoders (step 1610).

After the encoded data block or the unencoded data input data block isoutput (steps 1626 and 1636), a determination is made as to whether theinput data stream contains additional data blocks to be processed (step1628). If the input data stream includes additional data blocks(affirmative result in step 1628), the next successive data block isreceived (step 1632), its block size is counted (return to step 1602)and the data compression process in repeated. This process is iteratedfor each data block in the input data stream. Once the final input datablock is processed (negative result in step 1628), data compression ofthe input data stream is finished (step 1630).

FIGS. 17a and 17b are block diagrams illustrating a data compressionsystem employing both content independent and content dependent datacompression according to another embodiment of the present invention.The system in FIGS. 17a and 17b is similar in operation to the system ofFIGS. 13a and 13b in that content independent data compression isapplied to a data block when the content of the data block cannot beidentified or is not associable with a specific data compressionalgorithm. The system of FIGS. 17a and 17b additionally uses a prioriestimation algorithms or look-up tables to estimate the desirability ofusing content independent data compression encoders and/or contentdependent data compression encoders and selecting appropriate algorithmsor subsets thereof based on such estimation.

More specifically, a content dependent data recognition and orestimation module 1700 is utilized to analyze the incoming data streamfor recognition of data types, data strictures, data block formats, filesubstructures, file types, or any other parameters that may beindicative of the appropriate data compression algorithm or algorithms(in serial or in parallel) to be applied. Optionally, a data filerecognition list(s) or algorithm(s) 1710 module may be employed to holdassociations between recognized data parameters and appropriatealgorithms. If the content data compression module recognizes a portionof the data, that portion is routed to the content dependent encodermodule 1320, if not the data is routed to the content independentencoder module 30. It is to be appreciated that process of recognition(modules 1700 and 1710) is not limited to a deterministic recognition,but may further comprise a probabilistic estimation of which encoders toselect for compression from the set of encoders of the content dependentmodule 1320 or the content independent module 30. For example, a methodmay be employed to compute statistics of a data block whereby adetermination that the locality of repetition of characters in a datastream is determined is high can suggest a text document, which may bebeneficially compressed with a lossless dictionary type algorithm.Further the statistics of repeated characters and relative frequenciesmay suggest a specific type of dictionary algorithm. Long strings willrequire a wide dictionary file while a wide diversity of strings maysuggest a deep dictionary. Statistics may also be utilized in algorithmssuch as Huffman where various character statistics will dictate thechoice of different Huffinan compression tables. This technique is notlimited to lossless algorithms but may be widely employed with lossyalgorithms. Header information in frames for video files can imply aspecific data resolution. The estimator then may select the appropriatelossy compression algorithm and compression parameters (amount ofresolution desired). As shown in previous embodiments of the presentinvention, desirability of various algorithms and now associatedresolutions with lossy type algorithms may also be applied in theestimation selection process.

A mode of operation of the data compression system of FIGS. 17a and 17bwill now be discussed with reference to the flow diagrams of FIGS.18a-18d . The method of FIGS. 18a-18d use a priori estimation algorithmsor look-up tables to estimate the desirability or probability of usingcontent independent data compression encoders or content dependent datacompression encoders, and select appropriate or desirable algorithms orsubsets thereof based on such estimates. A data stream comprising one ormore data blocks is input into the data compression system and the firstdata block in the stream is received (step 1800). As stated above, datacompression is performed on a per data block basis. As previously stateda data block may represent any quantity of data from a single bitthrough a multiplicity of files or packets and may vary from block toblock. Accordingly, the first input data block in the input data streamis input into the counter module 10 that counts the size of the datablock (step 1802). The data block is then stored in the buffer 20 (step1804). The data block is then analyzed on a per block or multi-blockbasis by the content dependent/content independent data recognitionmodule 1700 (step 1806). If the data stream content is not recognizedutilizing the recognition list(s) or algorithms(s) module 1710 (step1808) the data is to the content independent encoder module 30. Anestimate of the best content independent encoders is performed (step1850) and the appropriate encoders are enabled and initialized asapplicable. The data is then compressed by each (enabled) encoder E1 . .. En (step 1810). Upon completion of the encoding of the input datablock, an encoded data block is output from each (enabled) encoder E1 .. . En and maintained in a corresponding buffer (step 1812), and theencoded data block size is counted (step 1814).

Next, a compression ratio is calculated for each encoded data block bytaking the ratio of the size of the input data block (as determined bythe input counter 10 to the size of each encoded data block output fromthe enabled encoders (step 1816). Each compression ratio is thencompared with an a priori-specified compression ratio threshold (step1818). It is to be understood that the threshold limit may be specifiedas any value inclusive of data expansion, no data compression orexpansion, or any arbitrarily desired compression limit. It is to befurther understood that notwithstanding that the current limit forlossless data compression is the entropy limit (the present definitionof information content) for the data, the present invention does notpreclude the use of future developments in lossless data compressionthat may increase lossless data compression ratios beyond what iscurrently known within the art. Additionally the content independentdata compression threshold may be different from the content dependentthreshold and either may be modified by the specific enabled encoders.

After the compression ratios are compared with the threshold, adetermination is made as to whether the compression ratio of at leastone of the encoded data blocks exceeds the threshold limit (step 1820).If there are no encoded data blocks having a compression ratio thatexceeds the compression ratio threshold limit (negative determination instep 1820), then the original unencoded input data block is selected foroutput and a null data compression type descriptor is appended thereto(step 1834). A null data compression type descriptor is defined as anyrecognizable data token or descriptor that indicates no data encodinghas been applied to the input data block. Accordingly, the unencodedinput data block with its corresponding null data compression typedescriptor is then output for subsequent data processing, storage, ortransmittal (step 1836).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 1820), then the encoded data block havingthe greatest compression ratio is selected (step 1822). An appropriatedata compression type descriptor is then appended (step 1824). A datacompression type descriptor is defined as any recognizable data token ordescriptor that indicates which data encoding technique has been appliedto the data. It is to be understood that, since encoders of theidentical type may be applied in parallel to enhance encoding speed (asdiscussed above), the data compression type descriptor identifies thecorresponding encoding technique applied to the encoded data block, notnecessarily the specific encoder. The encoded data block having thegreatest compression ratio along with its corresponding data compressiontype descriptor is then output for subsequent data processing, storage,or transmittal (step 1826).

As previously stated the data block stored in the buffer 20 (step 1804)is analyzed on a per block or multi-block basis by the content dependentdata recognition module 1300 (step 1806). If the data stream content isrecognized or estimated utilizing the recognition list(s) oralgorithms(s) module 1710 (affirmative result in step 1808) therecognized data type/file or block is selected based on a list oralgorithm (step 1838) and an estimate of the desirability of using theassociated content dependent algorithms can be determined (step 1840).For instance, even though a recognized data type may be associated withthree different encoders, an estimation of the desirability of usingeach encoder may result in only one or two of the encoders beingactually selected for use. The data is routed to the content dependentencoder module 1320 and compressed by each (enabled) encoder D1 . . . Dm(step 1842). Upon completion of the encoding of the input data block, anencoded data block is output from each (enabled) encoder D1 . . . Dm andmaintained in a corresponding buffer (step 1844), and the encoded datablock size is counted (step 1846).

Next, a compression ratio is calculated for each encoded data block bytaking the ratio of the size of the input data block (as determined bythe input counter 10 to the size of each encoded data block output fromthe enabled encoders (step 1848). Each compression ratio is thencompared with an a priori-specified compression ratio threshold (step1850). It is to be understood that the threshold limit may be specifiedas any value inclusive of data expansion, no data compression orexpansion, or any arbitrarily desired compression limit. It is to befurther understood that many of these algorithms may be lossy, and assuch the limits may be subject to or modified by an end target storage,listening, or viewing device. Further notwithstanding that the currentlimit for lossless data compression is the entropy limit (the presentdefinition of information content) for the data, the present inventiondoes not preclude the use of future developments in lossless datacompression that may increase lossless data compression ratios beyondwhat is currently known within the art. Additionally the contentindependent data compression threshold may be different from the contentdependent threshold and either may be modified by the specific enabledencoders.

After the compression ratios are compared with the threshold, adetermination is made as to whether the compression ratio of at leastone of the encoded data blocks exceeds the threshold limit (step 1820).If there are no encoded data blocks having a compression ratio thatexceeds the compression ratio threshold limit (negative determination instep 1820), then the original unencoded input data block is selected foroutput and a null data compression type descriptor is appended thereto(step 1834). A null data compression type descriptor is defined as anyrecognizable data token or descriptor that indicates no data encodinghas been applied to the input data block. Accordingly, the unencodedinput data block with its corresponding null data compression typedescriptor is then output for subsequent data processing, storage, ortransmittal (step 1836).

On the other hand, if one or more of the encoded data blocks possess acompression ratio greater than the compression ratio threshold limit(affirmative result in step 1820), then the encoded data block havingthe greatest compression ratio is selected (step 1822). An appropriatedata compression type descriptor is then appended (step 1824). A datacompression type descriptor is defined as any recognizable data token ordescriptor that indicates which data encoding technique has been appliedto the data. It is to be understood that, since encoders of theidentical type may be applied in parallel to enhance encoding speed (asdiscussed above), the data compression type descriptor identifies thecorresponding encoding technique applied to the encoded data block, notnecessarily the specific encoder. The encoded data block having thegreatest compression ratio along with its corresponding data compressiontype descriptor is then output for subsequent data processing, storage,or transmittal (step 1826).

After the encoded data block or the unencoded data input data block isoutput (steps 1826 and 1836), a determination is made as to whether theinput data stream contains additional data blocks to be processed (step1828). If the input data stream includes additional data blocks(affirmative result in step 1428), the next successive data block isreceived (step 1832), its block size is counted (return to step 1802)and the data compression process in repeated. This process is iteratedfor each data block in the input data stream. Once the final input datablock is processed (negative result in step 1828), data compression ofthe input data stream is finished (step 1830).

It is to be appreciated that in the embodiments described above withreference to FIGS. 13-18, an a priori specified time limit or any otherreal-time requirement may be employed to achieve practical and efficientreal-time operation.

Although illustrative embodiments have been described herein withreference to the accompanying drawings, it is to be understood that thepresent invention is not limited to those precise embodiments, and thatvarious other changes and modifications may be affected therein by oneskilled in the art without departing from the scope or spirit of theinvention. All such changes and modifications are intended to beincluded within the scope of the invention as defined by the appendedclaims.

What is claimed is:
 1. A system comprising: a memory; and one or moreprocessors configured to: receive one or more data blocks; identify oneor more attributes of the one or more data blocks; search forrecognition information stored on the memory corresponding to the one ormore attributes; compress the one or more data blocks according to thecorresponding recognition information obtained from the memory toproduce one or more encoded data blocks corresponding to the one or moredata blocks when recognition information corresponding to the one ormore attributes is found in the search of the memory; compress the oneor more data blocks with a default lossless compression encoder toproduce one or more compressed data blocks when the one or moreattributes of the data are not found in the search of the memory; andstore the one or more compressed data blocks and information regardingthe one or more attributes of the one or more data blocks in the memorywhen the one or more attributes of the data are not found in the searchof the memory.
 2. The system of claim 1, wherein the one or moreprocessors are further configured to store the one or more encoded datablocks in the memory.
 3. The system of claim 1, wherein the one or moreprocessors are further configured to store a data token in memory,wherein the data token corresponds to the data encoding technique usedto process the one or more data blocks when the one or more attributesof the data are found in the search of the memory.
 4. The system ofclaim 1, wherein the one or more processors are further configured tostore a data token in memory, wherein the data token corresponds to thedefault lossless compression encoder used to compress the one or moredata blocks.
 5. The system of claim 1, wherein the default losslesscompression encoder is configured to: receive one or more data blocks;compress the one or more data blocks with one or more compressiontechniques to produce a set of compressed one or more data blocks;calculate a figure of merit for each of the set of compressed one ormore data blocks to produce a set of figures of merit corresponding tothe set of compressed one or more data blocks; compare each of thefigures of merit of the set of figures of merit to an a priorithreshold; select for output of the default lossless compression encoderthe received one or more data blocks and appending a null data typecompression descriptor to the received one or more data blocks when eachof the figures of merit are less than the a priori threshold; and selectfor output of the default lossless compression encoder the one of theset of compressed one or more data blocks having the highest figure ofmerit and appending a corresponding data type compression descriptor tothe one of the set of compressed one or more data blocks when at leastone of the figures of merit exceed the a priori threshold.
 6. The systemof claim 1, wherein one or more data blocks comprise one or morecomplete files.
 7. The system of claim 1, wherein the identifying one ormore attributes includes identifying one or more data structures of theone or more data blocks.
 8. The system of claim 1, wherein theidentifying one or more attributes includes identifying one or more filesubstructures of the one or more data blocks.
 9. The system of claim 1,wherein the default lossless compression encoder comprises one or morelossless encoders.
 10. A method comprising: receiving one or more datablocks; identifying one or more attributes of the one or more datablocks; searching memory for recognition information corresponding tothe one or more attributes; compressing the one or more data blocksaccording to the corresponding recognition information obtained from thememory to produce one or more encoded data blocks corresponding to theone or more data blocks when the one or more attributes of the data arefound in the search of the memory; compressing the one or more datablocks with a default lossless compression encoder to produce one ormore compressed data blocks when the one or more attributes of the dataare not found in the search of the memory; and storing the one or morecompressed data blocks and information regarding the one or moreattributes of the one or more data blocks in memory when the one or moreattributes of the data are not found in the search of the memory. 11.The method of claim 10, further comprising storing the one or moreencoded data blocks in memory.
 12. The method of claim 10, furthercomprising storing a data token in memory, wherein the data tokencorresponds to the data encoding technique used to process the one ormore data blocks when the one or more attributes of the data are foundin the search of the memory.
 13. The method of claim 10, furthercomprising storing a data token in memory, wherein the data tokencorresponds to the default lossless compression encoder used to compressthe one or more data blocks.
 14. The method of claim 10, wherein thecompressing the one or more data blocks with a default losslesscompression encoder further comprises: receiving one or more data blocksat the default lossless compression encoder; compressing the one or moredata blocks with one or more compression techniques to produce a set ofcompressed one or more data blocks; calculating a figure of merit foreach of the set of compressed one or more data blocks to produce a setof figures of merit corresponding to the set of compressed one or moredata blocks; comparing each of the figures of merit of the set offigures of merit to an a priori threshold; selecting for output of thedefault lossless compression encoder the received one or more datablocks and appending a null data type compression descriptor to thereceived one or more data blocks when each of the figures of merit areless than the a priori threshold; and selecting for output of thedefault lossless compression encoder the one of the set of compressedone or more data blocks having the highest figure of merit and appendinga corresponding data type compression descriptor to the one of the setof compressed one or more data blocks when at least one of the figuresof merit exceed the a priori threshold.
 15. The method of claim 10,wherein one or more data blocks comprise one or more complete files. 16.The method of claim 10, wherein the identifying one or more attributesincludes identifying one or more data structures of the one or more datablocks.
 17. The method of claim 10, wherein the identifying one or moreattributes includes identifying one or more file substructures of theone or more data blocks.
 18. The method of claim 10, wherein the defaultlossless compression encoder comprises one or more lossless encoders.19. A non-transitory tangible machine-readable storage medium containinginstructions configured to cause one or more processors to execute aprocess comprising: identifying one or more attributes corresponding toone or more data blocks received by the one or more processors;searching memory for recognition information corresponding to the one ormore attributes; compressing the one or more data blocks according tothe corresponding recognition information obtained from the memory toproduce one or more encoded data blocks corresponding to the one or moredata blocks when the one or more attributes of the data are found in thesearch of the memory; compressing the one or more data blocks with adefault lossless compression encoder to produce one or more compresseddata blocks when the one or more attributes of the data are not found inthe search of the memory; and storing the one or more compressed datablocks and information regarding the one or more attributes of the oneor more data blocks in memory when the one or more attributes of thedata are not found in the search of the memory.
 20. The non-transitorymachine-readable storage medium of claim 19, further comprisinginstructions for storing the one or more encoded data blocks in memory.21. The non-transitory machine-readable storage medium of claim 19,further comprising instructions for storing a data token in memory,wherein the data token corresponds to the data encoding technique usedto process the one or more data blocks when the one or more attributesof the data are found in the search of the memory.
 22. Thenon-transitory machine-readable storage medium of claim 19, furthercomprising instructions for storing a data token in memory, wherein thedata token corresponds to the default lossless compression encoder usedto compress the one or more data blocks.
 23. The non-transitorymachine-readable storage medium of claim 19, wherein the compressing theone or more data blocks with a default lossless compression encoderfurther comprises: receiving one or more data blocks at the defaultlossless compression encoder; compressing the one or more data blockswith one or more compression techniques to produce a set of compressedone or more data blocks; calculating a figure of merit for each of theset of compressed one or more data blocks to produce a set of figures ofmerit corresponding to the set of compressed one or more data blocks;comparing each of the figures of merit of the set of figures of merit toan a priori threshold; selecting for output of the default losslesscompression encoder the received one or more data blocks and appending anull data type compression descriptor to the received one or more datablocks when each of the figures of merit are less than the a priorithreshold; and selecting for output of the default lossless compressionencoder the one of the set of compressed one or more data blocks havingthe highest figure of merit and appending a corresponding data typecompression descriptor to the one of the set of compressed one or moredata blocks when at least one of the figures of merit exceed the apriori threshold.
 24. The non-transitory machine-readable storage mediumof claim 19, wherein one or more data blocks comprise one or morecomplete files.
 25. The non-transitory machine-readable storage mediumof claim 19, wherein the identifying one or more attributes includesidentifying one or more data structures of the one or more data blocks.26. The non-transitory machine-readable storage medium of claim 19,wherein the identifying one or more attributes includes identifying oneor more file substructures of the one or more data blocks.
 27. Thenon-transitory machine-readable storage medium of claim 19, wherein thedefault lossless compression encoder comprises one or more losslessencoders.